Get Page (Web Mining)
Synopsis
Gets a page via HTTP.Description
This operator sends a GET request via HTTP. The returned page is output as a document.
Output
- output
The output port.
Parameters
- urlThe URL from which should be read. Range:
- random_user_agentChoose a user agent randomly from a set of 7000 user agents Range:
- user_agentThe user agent property. Range:
- connection_timeoutThe timeout (in ms) for the connection. Range:
- read_timeoutThe timeout (in ms) for reading from the URL. Range:
- follow_redirectsSpecifies, whether redirects should be followed. Range:
- accept_cookiesSpecifies, whether cookies should be accepted. Range:
- cookie_scopeSpecifies the scope of the cookies used Range:
- request_methodSpecifies the request method. Range:
- query_parametersThe query parameters as key/value pairs. Range:
- request_propertiesWith this parameter you can define all properties that are sent with the HTTP request to match the needs of your webservice. Range:
- override_encodingNormally, the encoding of the retrieved page is determined automatically. In some rare cases this does not work well or the server provides a wrong encoding string. In this case, you can enable this option to override the automatically detected encoding. Range:
- encodingThe encoding used for reading or writing files. Range:
- keep_sensitive_headersKeep "Authorization" and "Cookie" header during a redirect to a different domain or subdomain. Range: